Esta página contiene el código para generar análisis de redes personales (ego networks) en Twitter.
library(rtweet)
source("createTokens.R") ## keys y tokens privados
source("rtweet_functions.R") ## funciones para trabajar con múltiples tokens
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(ggwordcloud)
library(tidytext)
theme_set(theme_custom())
El primer paso consiste en escoger un usuario focal (o “ego”) a partir del cual construímos una red personal.
ego <- "Danielramirezzr" # Daniel Ramírez
ego_info <- lookup_users(ego, token = sample(token, 1))
ego_info$followers_count
## [1] 3197
Nombre: DanielX
Usuario: Danielramirezzr
Seguidores: 3197
Amigos: 570
Se unió a Twitter en 2015-03-14 15:58:27
Este análisis está dividido en tres partes.
Cada una de estas tres dimensiones corresponde a flujos de interacción diferentes. La primera consiste de los usuarios que reciben información de Danielramirezzr, la segunda son los usuarios que generan la información recibida por Danielramirezzr, y la tercera consiste en los usuarios donde el flujo de información es recíproco.
Este código es de acceso libre excepto por los keys y tokens privados que se consiguen abriendo una cuenta de desarrollador en https://developer.twitter.com/
El siguiente código extrae la lista de seguidores de Danielramirezzr (cada uno identificado con un user_id).
ego_followers <- get_followers(ego, token = sample(token, 1))
ego_followers
## # A tibble: 3,197 x 1
## user_id
## <chr>
## 1 2514365882
## 2 119181290
## 3 1326656879821545474
## 4 1730577278
## 5 1220145256848666626
## 6 52419142
## 7 1325594948591374338
## 8 870698812884492289
## 9 1319759601861120007
## 10 1315994406156218368
## # … with 3,187 more rows
Este user_id es exclusivo a cada cuenta, incluso cuando el usuario decide cambiar su nombre.
El siguiente código crea una carpeta llamada *_friends_of_followers/ donde queda archivado la lista de los seguidores de cada uno de estos usuarios.
Dependiendo del número de usuarios y el número de Tokens, esto puede llegar a demorarse varias horas (o incluso días).
outfolder <- paste0(ego, "_friends_of_followers/")
if (!dir.exists(outfolder)) dir.create(outfolder)
users_done <- str_replace(dir(outfolder), ".rds", "")
users_left <- setdiff(ego_followers$user_id, users_done)
while (length(users_left) > 0) {
new_user <- users_left[[1]]
friends_of_user <- try(multi_get_friends(new_user, token))
file_name <- str_glue("{outfolder}{new_user}.rds")
write_rds(friends_of_user, file_name, compress = "gz")
users_left <- users_left[-which(users_left %in% new_user)] ## int. subset
}
Para algunos usuarios esta información es imposible de conseguir porque son cuentas protegidas.
En este caso, no se puede obtener información sobre el 16.9% de los sequidores de Danielramirezzr.
Para construir la red, tomamos toda la lista de usuarios y sus amigos y los organizamos en dos columnas, donde cada fila indica un usario (from) siguiendo a otro usario (to).
edge_list <- list.files(outfolder, full.names = TRUE) %>%
map(read_rds)
edge_list <- edge_list[-error_index] %>%
bind_rows()
edge_list
## # A tibble: 4,287,491 x 2
## from to
## <chr> <chr>
## 1 100049987 1557085998
## 2 100049987 249409369
## 3 100049987 56408044
## 4 100049987 1074674327646339072
## 5 100049987 1244686639558840320
## 6 100049987 1205685150
## 7 100049987 1280251338
## 8 100049987 741375844849979392
## 9 100049987 1104449323012685824
## 10 100049987 1011948205
## # … with 4,287,481 more rows
Aquí hay 4,287,491 conexiones. Sin embargo, aquí están incluídos conexiones on usuarios más allá de los que siguen a Danielramirezzr.
ego_followers_info <- lookup_users(ego_followers$user_id, token = sample(token), 1)
write_rds(ego_followers_info, paste0(ego, "_follower_info.rds"), compress = "gz")
También podemos conseguir metadatos sobre cada usuario.
ego_followers_info <- read_rds(paste0(ego, "_follower_info.rds")) %>%
filter(!protected) %>%
select(
user_id, screen_name, lang, name, location, description,
ends_with("count"), -starts_with("quote"),
-starts_with("retweet"), -reply_count,
-starts_with("fav")
) %>%
rename(name = user_id, user_name = name)
id_dict <- ego_followers_info %>%
select(name, screen_name) %>%
deframe()
Por ejemplo, esta es la información que corresponde a los seguidores de Danielramirezzr con mayor número de seguidores.
ego_followers_info %>%
arrange(desc(followers_count)) %>%
select(screen_name, description, location, followers_count, friends_count)
## # A tibble: 2,634 x 5
## screen_name description location followers_count friends_count
## <chr> <chr> <chr> <int> <int>
## 1 Brn634 "Official creator of … "Medellín,… 75537 1864
## 2 gerardbermon "Aunque me encanta ir… "Medellín,… 73278 1491
## 3 HerranzJC "¡Sígueme en YouTube!… "Madrid" 57788 6721
## 4 gergeriin "Peludo & Pachón \U00… "CDMX " 56324 8305
## 5 SebastianG0… "Sólo respondo DM cua… "" 53718 1072
## 6 AndresCamil… "Jefe de Comunicacion… "Bogotá, D… 45249 20320
## 7 MMMaldonadoC "Dimensión jurídica d… "Bogotá" 43383 2434
## 8 AndressVerg… "NO apto para menores… "Cali, Col… 37658 1117
## 9 SoyElTorito… "🚫PERFIL XXX ☢🔞\nCERO… "Cuauhtémo… 36463 14750
## 10 Csuberxx "En cualquier momento… "Barranqui… 34931 2969
## # … with 2,624 more rows
Finalmente nos interesa la red personal de seguidores de Danielramirezzr, por lo cual eliminamos las conexiones entre usuarios que se encuentran por fuera de sus 3197
edge_list <- edge_list %>%
filter(to %in% ego_followers_info$name) %>%
filter(from %in% ego_followers_info$name)
edge_list
## # A tibble: 89,811 x 2
## from to
## <chr> <chr>
## 1 100049987 2189410279
## 2 100049987 136085801
## 3 100049987 2209542892
## 4 100049987 152714308
## 5 100049987 133387904
## 6 100049987 88534750
## 7 100049987 1030279999889309697
## 8 100049987 180151491
## 9 100049987 114499505
## 10 100049987 1242512791
## # … with 89,801 more rows
La red personal de seguidores de Danielramirezzr que pudimos reconstruir tiene 2634 usuarios con 89811 conexiones.
ego_network <- edge_list %>%
tidygraph::as_tbl_graph() %>%
left_join(ego_followers_info) %>%
rename(name = screen_name, user_id = name) %>%
select(name, everything())
ego_network
## # A tbl_graph: 2563 nodes and 89811 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 2,563 x 10 (active)
## name user_id lang user_name location description followers_count
## <chr> <chr> <chr> <chr> <chr> <chr> <int>
## 1 Maur… 100049… en Mauricio Cartage… "Costeño. … 197
## 2 Jair… 100051… und JairoEst… AXM - C… "Más de gu… 88
## 3 Abal… 100087… es Juliana … Colombia "Escribo p… 148
## 4 juan… 100246… und Felipe G… Villeta… "" 132
## 5 leos… 100277… und leo sbro… Bogotá,… "https://t… 482
## 6 Pipe… 100396… es Felipe M… Bogotá,… "Médico - … 1544
## # … with 2,557 more rows, and 3 more variables: friends_count <int>,
## # listed_count <int>, statuses_count <int>
## #
## # Edge Data: 89,811 x 2
## from to
## <int> <int>
## 1 1 1244
## 2 1 943
## 3 1 1254
## # … with 89,808 more rows
## Estadísticas descriptivas
ego_network <- ego_network %>%
mutate(
out_degree = centrality_degree(mode = "out"),
in_degree = centrality_degree(mode = "in"),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
ego_network
## # A tbl_graph: 2563 nodes and 89811 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 2,563 x 15 (active)
## name user_id lang user_name location description followers_count
## <chr> <chr> <chr> <chr> <chr> <chr> <int>
## 1 Maur… 100049… en Mauricio Cartage… "Costeño. … 197
## 2 Jair… 100051… und JairoEst… AXM - C… "Más de gu… 88
## 3 Abal… 100087… es Juliana … Colombia "Escribo p… 148
## 4 juan… 100246… und Felipe G… Villeta… "" 132
## 5 leos… 100277… und leo sbro… Bogotá,… "https://t… 482
## 6 Pipe… 100396… es Felipe M… Bogotá,… "Médico - … 1544
## # … with 2,557 more rows, and 8 more variables: friends_count <int>,
## # listed_count <int>, statuses_count <int>, out_degree <dbl>,
## # in_degree <dbl>, betweenness <dbl>, authority_score <dbl>,
## # eigen_centrality <dbl>
## #
## # Edge Data: 89,811 x 2
## from to
## <int> <int>
## 1 1 1244
## 2 1 943
## 3 1 1254
## # … with 89,808 more rows
La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de seguidores (eje vertical)
ego_network %>%
as_tibble() %>%
#filter(in_degree > 5) %>%
ggplot(aes(followers_count, in_degree)) +
geom_point()
ego_network %>%
as_tibble() %>%
mutate(label_name = ifelse(
test = rank(-followers_count) <= 10 | rank(-in_degree) <= 10,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(followers_count, in_degree)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_name), size = 3)
Clusters
set.seed(123)
clusters <- igraph::cluster_walktrap(graph = ego_network, steps = 7)
cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names)
cluster_df <- cluster_df %>%
group_by(cluster) %>%
filter(n() >= 10) %>%
ungroup()
ego_network <- ego_network %>%
left_join(cluster_df)
ego_network %>%
as_tibble() %>%
arrange(desc(in_degree)) %>%
filter(!is.na(cluster)) %>%
group_by(cluster) %>%
filter(rank(-authority_score) <= 30) %>%
ggplot(aes(label = name, size = log(in_degree), color = in_degree)) +
geom_text_wordcloud_area(family = "Avenir Next Condensed") +
facet_wrap(~cluster) +
labs(title = "Seguidores prominentes en cada cluster") +
scale_color_gradient(low = "grey", high = "purple")
Tamaño de cada cluster:
ego_network %>% as_tibble() %>% count(cluster)
## # A tibble: 7 x 2
## cluster n
## <fct> <int>
## 1 1 720
## 2 2 295
## 3 5 50
## 4 9 144
## 5 11 304
## 6 16 921
## 7 <NA> 129
¿Quiénes son los usuarios que funcionan como “puentes”?
ego_network %>%
as_tibble() %>%
arrange(desc(betweenness)) %>%
select(name, description, location)
## # A tibble: 2,563 x 3
## name description location
## <chr> <chr> <chr>
## 1 Elbuhonejo "Contador pero no de chistes, Cinéfilo, Viajero… "Bogotá, D.C., …
## 2 AndresCami… "Jefe de Comunicaciones y Prensa del Senador @p… "Bogotá, D.C., …
## 3 MaoCelisCa "Tech Lover | MTB" "Colombia"
## 4 ALEJOMICHE… "Abogado/Activista DDHH, trabajando por la Prop… "Bogota Colombi…
## 5 netchmusic "Cantautor. \n\nHacer canciones es lo único qu… "En la Luna "
## 6 DonDanielin "Más ficción que realidad." "Colombia"
## 7 JairoSoto "Yo no miento, exagero. Periodista y barranquil… "Bogotá, D.C., …
## 8 Juandam_m "Manizalita de acento rolo. Ex-gordo. Inventado… "Colombia"
## 9 javro "" "Bogotá, D.C., …
## 10 Elbayabuyi… "Con alma de gordo, intenso, de Sogamoso para e… "Bogotá"
## # … with 2,553 more rows
cols <- c("betweenness", "in_degree", "out_degree", "followers_count", "friends_count")
ego_network %>%
as_tibble() %>%
group_by(cluster) %>%
summarize(across(all_of(cols), mean)) %>%
arrange(desc(betweenness))
## # A tibble: 7 x 6
## cluster betweenness in_degree out_degree followers_count friends_count
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 11 5076. 29.1 29.2 1444. 1381.
## 2 16 4929. 63.1 53.8 1216. 1135.
## 3 1 3632. 22.1 33.0 1329. 2232.
## 4 9 2155. 17.6 15.6 1883. 1920.
## 5 2 1762. 12.1 14.8 1709. 2379.
## 6 5 1392. 10.0 13.1 950. 1547.
## 7 <NA> 1055. 2.05 2.64 341. 645.
Dada la información anterior podemos enfocarnos en segmentos particulares de la red personal.
Por ejemplo, podemos enfocarnos exclusivamente en los usuarios que hacen parte de los grupos etiquetados con 9 y 11.
ego_network_subset <- ego_network %>%
filter(cluster %in% c(9, 11)) %>%
mutate(
out_degree = centrality_degree(mode = "out"),
in_degree = centrality_degree(mode = "in"),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
ego_network_subset %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = in_degree),
shape = 21, color = "white", show.legend = FALSE)
ego_network_subset %>%
as_tibble() %>%
mutate(label_id = ifelse(
test = rank(-betweenness) <= 10 |rank(-in_degree) <= 10,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(betweenness, in_degree, color = cluster)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_id), size = 3)
ego_network_subset %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = authority_score),
shape = 21, color = "white", show.legend = FALSE) +
geom_node_label(aes(filter = rank(-authority_score) <= 10,
label = name),
repel = TRUE, alpha = 3/4, size = 3)
Esta sección repite el análisis anterior para la red personal de amigos de Danielramirezzr
outfolder <- paste0(ego, "_friends_of_friends/")
if (!dir.exists(outfolder)) dir.create(outfolder)
ego_friends <- get_friends(ego, token = sample(token, 1))
ego_friends
## # A tibble: 570 x 2
## user user_id
## <chr> <chr>
## 1 Danielramirezzr 1376071800
## 2 Danielramirezzr 289506642
## 3 Danielramirezzr 555188075
## 4 Danielramirezzr 57666196
## 5 Danielramirezzr 322011448
## 6 Danielramirezzr 576624661
## 7 Danielramirezzr 1730577278
## 8 Danielramirezzr 174443391
## 9 Danielramirezzr 588952390
## 10 Danielramirezzr 142171297
## # … with 560 more rows
users_done <- str_replace(dir(outfolder), ".rds", "")
users_left <- setdiff(ego_friends$user_id, users_done)
while (length(users_left) > 0) {
new_user <- users_left[[1]]
friends_of_user <- try(multi_get_friends(new_user, token))
file_name <- str_glue("{outfolder}{new_user}.rds")
write_rds(friends_of_user, file_name, compress = "gz")
users_left <- users_left[-which(users_left %in% new_user)] ## int. subset
}
En este caso, no se puede obtener información sobre el 0.5% de los amigos de Danielramirezzr.
edge_list <- list.files(outfolder, full.names = TRUE) %>%
map(read_rds)
edge_list <- edge_list[-error_index] %>% bind_rows()
edge_list
## # A tibble: 805,134 x 2
## from to
## <chr> <chr>
## 1 1000511132 919976669213020160
## 2 1000511132 752529589755404288
## 3 1000511132 1109856955101790209
## 4 1000511132 95033445
## 5 1000511132 48396652
## 6 1000511132 39955069
## 7 1000511132 1113640592842600448
## 8 1000511132 161106995
## 9 1000511132 2400080066
## 10 1000511132 3001776580
## # … with 805,124 more rows
ego_friends_info <- lookup_users(ego_friends$user_id, token = token)
write_rds(ego_friends_info, paste0(ego, "_friends_info.rds"), compress = "gz")
ego_friends_info <- read_rds(paste0(ego, "_friends_info.rds")) %>%
filter(!protected) %>%
select(
user_id, screen_name, lang, name, location, description,
ends_with("count"), -starts_with("quote"),
-starts_with("retweet"), -reply_count,
-starts_with("fav")
) %>%
rename(name = user_id, user_name = name)
id_dict <- ego_friends_info %>%
select(name, screen_name) %>%
deframe()
Esta es la información que corresponde a los amigos de Danielramirezzr con mayor número de seguidores.
ego_friends_info %>%
arrange(desc(followers_count)) %>%
select(screen_name, description, location, followers_count, friends_count)
## # A tibble: 552 x 5
## screen_name description location followers_count friends_count
## <chr> <chr> <chr> <int> <int>
## 1 elespectador "Noticias de Colombia … Bogotá, C… 5412852 51738
## 2 RevistaSema… "Periodismo con caráct… Colombia 4561113 45
## 3 bbcmundo "Twitter oficial de BB… Londres, … 4176206 420
## 4 petrogustavo "Perfil Oficial del di… ÜT: 4.650… 4008247 2461
## 5 Citytv "Información de Colomb… Bogotá 3063947 3537
## 6 ClaudiaLopez "Primera Alcaldesa de … Bogotá, D… 2471874 2508
## 7 UN_Women "UN Women is the UN en… Worldwide 1929842 4209
## 8 Bogota "Twitter oficial de la… Bogotá, C… 1675261 2741
## 9 SectorMovil… "Información oficial d… Bogotá, C… 1451865 643
## 10 MJDuzan "Periodista" Colombia 1114841 4529
## # … with 542 more rows
edge_list <- edge_list %>%
filter(to %in% ego_friends_info$name) %>%
filter(from %in% ego_friends_info$name)
edge_list
## # A tibble: 26,762 x 2
## from to
## <chr> <chr>
## 1 1000511132 161106995
## 2 1000511132 588952390
## 3 1000511132 2216042148
## 4 1000511132 180151491
## 5 1000511132 86995313
## 6 1000511132 229837949
## 7 1000511132 39431290
## 8 1000511132 78920906
## 9 1000511132 10012122
## 10 1000511132 127988585
## # … with 26,752 more rows
La red personal de seguidores de Danielramirezzr que pudimos reconstruir tiene 552 usuarios con 26762 conexiones.
ego_network <- edge_list %>%
tidygraph::as_tbl_graph() %>%
left_join(ego_friends_info) %>%
rename(name = screen_name, user_id = name) %>%
select(name, everything())
ego_network
## # A tbl_graph: 551 nodes and 26762 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 551 x 10 (active)
## name user_id lang user_name location description followers_count
## <chr> <chr> <chr> <chr> <chr> <chr> <int>
## 1 Jair… 100051… und JairoEst… AXM - C… "Más de gu… 88
## 2 bbcm… 100121… es BBC News… Londres… "Twitter o… 4176206
## 3 Nava… 100403… es Esteban … Bogotá,… "Gender & … 584
## 4 Brig… 101486… es Brigitte… Bogotá,… "Naturalme… 115467
## 5 elbi… 101498… es A. Miami, … "" 11
## 6 Marc… 101511… es 𝕞𝕒𝕣𝕔𝕚𝕒𝕟𝕒… Bogotá,… "𝐏𝐮𝐭𝐚 𝐯𝐢𝐫𝐭… 29861
## # … with 545 more rows, and 3 more variables: friends_count <int>,
## # listed_count <int>, statuses_count <int>
## #
## # Edge Data: 26,762 x 2
## from to
## <int> <int>
## 1 1 153
## 2 1 409
## 3 1 213
## # … with 26,759 more rows
## Estadísticas descriptivas
ego_network <- ego_network %>%
mutate(
out_degree = centrality_degree(mode = "out"),
in_degree = centrality_degree(mode = "in"),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
ego_network
## # A tbl_graph: 551 nodes and 26762 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 551 x 15 (active)
## name user_id lang user_name location description followers_count
## <chr> <chr> <chr> <chr> <chr> <chr> <int>
## 1 Jair… 100051… und JairoEst… AXM - C… "Más de gu… 88
## 2 bbcm… 100121… es BBC News… Londres… "Twitter o… 4176206
## 3 Nava… 100403… es Esteban … Bogotá,… "Gender & … 584
## 4 Brig… 101486… es Brigitte… Bogotá,… "Naturalme… 115467
## 5 elbi… 101498… es A. Miami, … "" 11
## 6 Marc… 101511… es 𝕞𝕒𝕣𝕔𝕚𝕒𝕟𝕒… Bogotá,… "𝐏𝐮𝐭𝐚 𝐯𝐢𝐫𝐭… 29861
## # … with 545 more rows, and 8 more variables: friends_count <int>,
## # listed_count <int>, statuses_count <int>, out_degree <dbl>,
## # in_degree <dbl>, betweenness <dbl>, authority_score <dbl>,
## # eigen_centrality <dbl>
## #
## # Edge Data: 26,762 x 2
## from to
## <int> <int>
## 1 1 153
## 2 1 409
## 3 1 213
## # … with 26,759 more rows
La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de amigos (eje vertical)
ego_network %>%
as_tibble() %>%
#filter(in_degree > 5) %>%
ggplot(aes(followers_count, in_degree)) +
geom_point()
ego_network %>%
as_tibble() %>%
mutate(label_name = ifelse(
test = rank(-followers_count) <= 10 | rank(-in_degree) <= 10,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(followers_count, in_degree)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_name), size = 3)
Clusters
clusters <- igraph::cluster_walktrap(graph = ego_network, steps = 7)
cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names)
cluster_df <- cluster_df %>%
group_by(cluster) %>%
filter(n() >= 10) %>%
ungroup()
ego_network <- ego_network %>%
left_join(cluster_df)
ego_network %>%
as_tibble() %>%
arrange(desc(in_degree)) %>%
filter(!is.na(cluster)) %>%
group_by(cluster) %>%
filter(rank(-authority_score) <= 50) %>%
ggplot(aes(label = name, size = log(in_degree), color = in_degree)) +
geom_text_wordcloud_area(family = "Avenir Next Condensed") +
facet_wrap(~cluster) +
labs(title = "Seguidores prominentes en cada cluster") +
scale_color_gradient(low = "grey", high = "purple")
Tamaño de cada cluster:
ego_network %>% as_tibble() %>% count(cluster)
## # A tibble: 3 x 2
## cluster n
## <fct> <int>
## 1 1 294
## 2 2 254
## 3 <NA> 3
¿Quiénes son los usuarios que funcionan como “puentes”?
ego_network %>%
as_tibble() %>%
arrange(desc(betweenness)) %>%
select(name, description, location)
## # A tibble: 551 x 3
## name description location
## <chr> <chr> <chr>
## 1 AndresCami… "Jefe de Comunicaciones y Prensa del Senador @p… "Bogotá, D.C., …
## 2 AngelicaLo… "Ciudadana, senadora de Colombia 🇨🇴 Partido Ver… ""
## 3 JairoSoto "Yo no miento, exagero. Periodista y barranquil… "Bogotá, D.C., …
## 4 sergemont "Profesor Asociado en Desarrollo Urbano y Regio… "Bogotá, D.C., …
## 5 Elbayabuyi… "Con alma de gordo, intenso, de Sogamoso para e… "Bogotá"
## 6 ismene2 "Mamerta.🔻" "Colombia"
## 7 angelamrob… "Psicóloga. Mg. en Política Social. En oposició… "Colombia"
## 8 ClaudiaLop… "Primera Alcaldesa de Bogotá. Orgullosa bogota… "Bogotá, DC, Co…
## 9 ALEJOMICHE… "Abogado/Activista DDHH, trabajando por la Prop… "Bogota Colombi…
## 10 MJDuzan "Periodista" "Colombia"
## # … with 541 more rows
cols <- c("betweenness", "in_degree", "out_degree", "followers_count", "friends_count")
ego_network %>%
as_tibble() %>%
group_by(cluster) %>%
summarize(across(all_of(cols), mean)) %>%
arrange(desc(betweenness))
## # A tibble: 3 x 6
## cluster betweenness in_degree out_degree followers_count friends_count
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 <NA> 973. 2.67 5 229 216
## 2 1 674. 55.0 61.2 13224. 1188.
## 3 2 619. 41.7 34.4 166824. 1741.
ego_network_subset <- ego_network %>%
filter(!is.na(cluster)) %>%
mutate(
out_degree = centrality_degree(mode = "out"),
in_degree = centrality_degree(mode = "in"),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
ego_network_subset %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = in_degree),
shape = 21, color = "white", show.legend = FALSE)
ego_network_subset %>%
as_tibble() %>%
mutate(label_id = ifelse(
test = rank(-betweenness) <= 10 |rank(-in_degree) <= 10,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(betweenness, in_degree, color = cluster)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_id), size = 3)
ego_network_subset %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = authority_score),
shape = 21, color = "white", show.legend = FALSE) +
geom_node_label(aes(filter = rank(-authority_score) <= 10 | rank(-betweenness) <= 10,
label = name),
repel = TRUE, alpha = 3/4, size = 3)
edge_list1 <- list.files(paste0(ego, "_friends_of_friends/"), full.names = TRUE) %>%
map(read_rds)
error_index <- edge_list1 %>%
map_lgl(~ any(class(.x) == "try-error")) %>%
which()
edge_list1 <- edge_list1[-error_index] %>% bind_rows()
edge_list2 <- list.files(paste0(ego, "_friends_of_followers/"), full.names = TRUE) %>%
map(read_rds)
error_index <- edge_list2 %>%
map_lgl(~ any(class(.x) == "try-error")) %>%
which()
edge_list2 <- edge_list2[-error_index] %>% bind_rows()
mutual_network <- inner_join(
edge_list1,
edge_list2
) %>%
filter(from %in% ego_followers$user_id, to %in% ego_followers$user_id) %>%
filter(from %in% ego_friends$user_id, to %in% ego_friends$user_id) %>%
filter(from %in% to, to %in% from)
mutual_network <- mutual_network %>%
mutate(n = 1) %>%
tidytext::cast_sparse(from, to, n) %>%
graph_from_adjacency_matrix(mode = "undirected") %>%
tidygraph::as_tbl_graph()
ego_mutuals_info <- lookup_users(as_tibble(mutual_network)$name, token = sample(token), 1)
ego_mutuals_info <- ego_mutuals_info %>%
filter(!protected) %>%
select(
user_id, screen_name, lang, name, location, description,
ends_with("count"), -starts_with("quote"),
-starts_with("retweet"), -reply_count,
-starts_with("fav")
) %>%
rename(name = user_id, user_name = name)
mutual_network <- mutual_network %>%
inner_join(ego_mutuals_info) %>%
rename(name = screen_name, user_id = name) %>%
select(name, everything())
## Estadísticas descriptivas
mutual_network <- mutual_network %>%
mutate(
out_degree = centrality_degree(mode = "out"),
in_degree = centrality_degree(mode = "in"),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de amigos (eje vertical)
mutual_network %>%
as_tibble() %>%
#filter(in_degree > 5) %>%
ggplot(aes(followers_count, in_degree)) +
geom_point()
mutual_network %>%
as_tibble() %>%
mutate(label_name = ifelse(
test = rank(-followers_count) <= 10 | rank(-in_degree) <= 10,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(followers_count, in_degree)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_name), size = 3)
Clusters
clusters <- igraph::cluster_walktrap(graph = ego_network, steps = 7)
cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names)
cluster_df <- cluster_df %>%
group_by(cluster) %>%
filter(n() >= 10) %>%
ungroup()
mutual_network <- mutual_network %>%
left_join(cluster_df)
mutual_network %>%
as_tibble() %>%
arrange(desc(in_degree)) %>%
filter(!is.na(cluster)) %>%
group_by(cluster) %>%
filter(rank(-authority_score) <= 50) %>%
ggplot(aes(label = name, size = log(in_degree), color = in_degree)) +
geom_text_wordcloud_area(family = "Avenir Next Condensed") +
facet_wrap(~cluster) +
labs(title = "Seguidores prominentes en cada cluster") +
scale_color_gradient(low = "grey", high = "purple")
Tamaño de cada cluster:
mutual_network %>% as_tibble() %>% count(cluster)
## # A tibble: 3 x 2
## cluster n
## <fct> <int>
## 1 1 264
## 2 2 97
## 3 <NA> 3
¿Quiénes son los usuarios que funcionan como “puentes”?
mutual_network %>%
as_tibble() %>%
arrange(desc(betweenness))
## # A tibble: 364 x 16
## name user_id lang user_name location description followers_count
## <chr> <chr> <chr> <chr> <chr> <chr> <int>
## 1 mmau… 131959… es mar "Colomb… "marica, r… 6529
## 2 Elba… 789209… es Rodrigo … "Bogotá" "Con alma … 13115
## 3 Lech… 739925… es SS "Bogotá… "el del tu… 2062
## 4 Andr… 102769… es Andrés H… "Bogotá… "Jefe de C… 45250
## 5 miss… 885764… und La Señor… "Bogotá… "Drag Quee… 22796
## 6 Fede… 144805… es Fede ⚡️ "Bogotá… "Guaratora… 5077
## 7 carl… 105281… es Carlos G… "Medell… "disappoin… 8233
## 8 efes… 119873… es Fabián E… "Bogotá" "Arquitect… 20
## 9 teba… 239268… und Esteban "" "«Y estamo… 4445
## 10 d_ib… 394312… es Daniel I… "Bogotá… "Mariachi … 13517
## # … with 354 more rows, and 9 more variables: friends_count <int>,
## # listed_count <int>, statuses_count <int>, out_degree <dbl>,
## # in_degree <dbl>, betweenness <dbl>, authority_score <dbl>,
## # eigen_centrality <dbl>, cluster <fct>
cols <- c("betweenness", "in_degree", "out_degree", "followers_count", "friends_count")
mutual_network %>%
as_tibble() %>%
group_by(cluster) %>%
summarize(across(cols, mean)) %>%
arrange(desc(betweenness))
## # A tibble: 3 x 6
## cluster betweenness in_degree out_degree followers_count friends_count
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 169. 90.5 90.5 4346. 1242.
## 2 2 85.7 52.6 52.6 2763. 1477.
## 3 <NA> 40.3 21 21 229 216
mutual_network_subset <- mutual_network %>%
filter(!is.na(cluster)) %>%
mutate(
out_degree = centrality_degree(mode = "out"),
in_degree = centrality_degree(mode = "in"),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
mutual_network_subset %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = in_degree),
shape = 21, color = "white", show.legend = FALSE)
mutual_network_subset %>%
as_tibble() %>%
mutate(label_id = ifelse(
test = rank(-betweenness) <= 10 |rank(-in_degree) <= 10,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(betweenness, in_degree, color = cluster)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_id), size = 3)
mutual_network_subset %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = authority_score),
shape = 21, color = "white", show.legend = FALSE) +
geom_node_label(aes(filter = rank(-authority_score) <= 10 | rank(-betweenness) <= 10,
label = name),
repel = TRUE, alpha = 3/4, size = 3)
mutual_network_subset %>% as_tibble() %>% View
readLines("rtweet_functions.R") %>%
writeLines()
##
## # main functions ----------------------------------------------------------
##
## multi_get_friends <- function(u, token_list) {
##
## user_info <- lookup_users(u, token = sample(token_list, 1)[[1]])
## fc <- user_info$friends_count
## message("<<", user_info$screen_name, ">> is following ", scales::comma(fc), " users ")
##
## if (user_info$protected) stop(call. = FALSE, "The account is protected, we can't get followers.")
##
## num_queries <- ceiling(fc / 5000)
## rl <- rate_limit(token_list, "get_friends")
## rl <- validate_rate_limit(rl, "get_friends", token_list)
##
## index <- get_available_token_index(rl)
##
## # Case 0: User doesn't have any friends
##
## if (fc == 0) return(tibble(from = character(0), to = character(0)))
##
## # Case 1: Less than 5,000 friends, only call is needed
##
## if (fc <= 5e3) {
##
## friends <- get_friends(u, token = token_list[[index]])
##
## } else {
##
## # Case 2: Many calls are needed
##
## output <- vector("list", length = num_queries)
## output[[1]] <- get_friends(u, token = token_list[[index]])
##
## for (i in 2:length(output)) {
##
## rl <- validate_rate_limit(rl, "get_friends", token_list)
## index <- get_available_token_index(rl)
## output[[i]] <- get_friends(u, token = token_list[[index]], page = next_cursor(output[[i - 1]]))
##
## }
##
## friends <- bind_rows(output) %>%
## distinct()
##
## }
##
## attr(friends, "next_cursor") <- NULL
##
## friends %>%
## rename(from = user, to = user_id) %>%
## mutate(from = user_info$user_id)
##
## }
##
## multi_get_timeline <- function(u, n, token_list, home = FALSE) {
##
## message(u)
## rl <- rate_limit(token_list, "get_timeline")
## rl <- validate_rate_limit(rl, "get_timeline", token_list)
##
## index <- get_available_token_index(rl)
##
## # Case 0: User doesn't have any posts
##
## # what to do?
##
## # Should we allow to get all the timeline??? If so, mimic previous function
##
## tl <- get_timeline(u, n = n, home = home, token = token_list[[index]])
##
## return(tl)
##
## }
##
## # multi_lookup_users <- function() {
## #
## #
## # }
##
##
## # helpers -----------------------------------------------------------------
##
## validate_rate_limit <- function(rl, q, token_list) {
##
## if (is_empty(rl)) {
## message("Waiting for rate limiting update")
## Sys.sleep(60)
## rl <- rate_limit(token_list, query = q)
## validate_rate_limit(rl, q, token_list) # recursion!
##
## }
##
## if (all(rl$remaining == 0)) {
##
## message("Waiting for token reset in ", round(min(rl$reset), 1), " minutes")
## Sys.sleep(min(as.numeric(rl$reset_at - Sys.time(), units = "secs")) + 5)
## rl <- rate_limit(token_list, query = q)
## validate_rate_limit(rl, q, token_list) # recursion!
##
## }
##
## rl
##
## }
##
## get_available_token_index <- function(rl) {
##
## env <- rlang::caller_env()
## available_token <- rl$remaining > 0
## index <- which(available_token)[[1]]
## env$rl[index, ]$remaining <- rl[index, ]$remaining - 1 # this modifies the rl obj in the parent frame
## return(index)
##
## }
theme_custom
## function (base_family = "Avenir Next Condensed", fill = "white", ...) {
## theme_minimal(base_family = base_family, ...) %+replace%
## theme(plot.title = element_text(face = "bold", margin = margin(0,
## 0, 5, 0), hjust = 0, size = 13), plot.subtitle = element_text(face = "italic",
## margin = margin(0, 0, 5, 0), hjust = 0), plot.background = element_rect(fill = fill,
## size = 0), complete = TRUE, axis.title.x = element_text(margin = margin(15,
## 0, 0, 0)), axis.title.y = element_text(angle = 90,
## margin = margin(0, 20, 0, 0)), strip.text = element_text(face = "italic",
## colour = "white"), strip.background = element_rect(fill = "#4C4C4C"))
## }